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Rubellimicrobium mesophilum Dastager et al. 2008 is a mesophilic and light reddish- 
pigmented representative of the Roseobacter group within the alphaproteobacterial family 
Rhodobacteraceae. Representatives of the Roseobacter group play an important role in the 
marine biogeochemical cycles and were found in a broad variety of marine environments as- 
sociated with algal blooms, different kinds of sediments, and surfaces of invertebrates and 
vertebrates. Roseobacters were shown to be widely distributed, especially within the total 
bacterial community found in coastal waters, as well as in mixed water layers of the open 
ocean. Here we describe the features of R. mesophilum strain MSL-20 T together with its ge- 
nome sequence and annotation generated from a culture of DSM 19309 T . The 4,927,676 bp 
genome sequence consists of one chromosome and probably one extrachromosomal ele- 
ment. It contains 5,082 protein-coding genes and 56 RNA genes. As previously reported, the 
G+C content is significantly different from the actual genome sequence-based G+C content 
and as the type strain tests positively for oxidase, the species description is emended accord- 
ingly. The genome was sequenced as part of the activities of the Transregional Collaborative 
Research Centre 51 (TRR51) funded by the German Research Foundation (DFG). 



Introduction 

Strain MSL-20 T (= DSM 19309T = KCTC 22012T) is 
the type strain of the species Rubellimicrobium 
mesophilum [1], one of four species with validly 
published names in the genus Rubellimicrobium 
[2,3]; the other three species in the genus are R. 
thermophilum [3], R. aerolatum [4] and R. roseum 
[5]. Rubellimicrobium belongs to the abundant 
marine Roseobacter group [6]. The species epithet 
mesophilum refers to the Greek adjective mesos, 
middle, as well as from the Neo- Latin adjective 
'philus -a -um', friend/loving [1], the middle 
(temperature-) loving. Strain MSL-20 T was isolat- 
ed from soil located at Bigeum Island, Republic of 
Korea [1], whereas the other type strains within 
the genus Rubellimicrobium were isolated from a 




paper mill [R. thermophilum [3]), air [R. aerolatum 
[4]) and forest soil [R. roseum [5]), which indicates 
rather diverse habitats for Rubellimicrobium. Cur- 
rent PubMed records do not indicate any follow- 
up research with strain MSL-20 T since the initial 
description of R. mesophilum [1]. Here we present 
a summary classification and a set of features for 
R. mesophilum MSL-20 T , together with the descrip- 
tion of the complete genomic sequencing and an- 
notation. 

Classification and features 
1 6S rRNA gene analysis 

Figure 1 shows the phylogenetic neighborhood of 
R. mesophilum in a 16S rRNA gene sequence-based 
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tree. The sequence of the single 16S rRNA gene in 
the DSM 193 09 T genome does not differ from the 
previously published 16S rRNA gene sequence 
(EF547368), which contains four ambiguous base 
calls. 

The genomic 16S rRNA gene sequence of R. 
mesophilum DSM 193 09 T was compared with the 
Greengenes database for determining the 
weighted relative frequencies of taxa and (trun- 
cated] keywords as previously described [7]. The 
most frequently occurring genera were 
Paracoccus (45.3%), Loktanella (30.3%), 
Rubellimicrobium (14.0%), Methylarcula (8.4%) 
and 'Pararubellimicrobium' (2.0%) (58 hits in to- 
tal). Regarding the five hits to sequences from 
other members of the genus, the average identity 
within HSPs was 94.9%, whereas the average cov- 
erage by HSPs was 99.3%. Among all other spe- 
cies, the one yielding the highest score was 



'Pararubellimicrobium aerilata' (EU3 38486), 
which corresponded to an identity of 96.3% and a 
HSP coverage of 98.0%. (Note that the Greengenes 
database uses the INSDC (=EMBL/NCBI/DDBJ) 
annotation, which is not an authoritative source 
for nomenclature or classification). The highest- 
scoring environmental sequence was JF417792 
(Greengenes short name 'microbial structures 
coalbeds located Eerduosi Basin China coalbed 
clone QQSB73'), which showed an identity of 
98.7% and a HSP coverage of 99.6%. The most 
frequently occurring keywords within the labels 
of all environmental samples which yielded hits 
were 'skin' (10.6%), 'fossa' (5.9%), 'poplit' (4.2%), 
'forearm, volar' (3.3%) and 'sea' (2.8%) (192 hits 
in total). Environmental samples which yielded 
hits of a higher score than the highest scoring spe- 
cies were not found, indicating that R. mesophilum 
has rarely been detected in the environment. 



■ Rubellimicrobium mesophilum (EF547363) 



- Rubellimicrobium aerolatum (EU338486) 



- Rubellimicrobium roseum (GU109478) 



- Rubellimicrobium thermophilum (AJ844281) * 



■ Citreicella aestuani (FJ230833) 



Citreicella marina (EU928765) 



- Citreicella thiooxidans (AY639887) 



Wenxinia marina (IMG2520734673) * 



Figure 1. Phylogenetic tree highlighting the position of R. mesophilum relative to the type strains of the other species 
within the genus Rubellimicrobium and the neighboring genera Citreicella and Wenxinia. The tree was inferred from 
1,381 aligned characters of the 16S rRNA gene sequences under the maximum likelihood (ML) criterion as previous- 
ly described [7]. The branches are scaled in terms of the expected number of substitutions per site. Numbers adja- 
cent to the branches are support values from 1,000 ML bootstrap replicates (left) and from 1,000 maximum- 
parsimony bootstrap replicates (right) if larger than 60% [7]. Lineages with type strain genome sequencing projects 
registered in GOLD [8] are labeled with one asterisk [9]. 
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Morphology and physiology 

Cells of strain MSL-20 T stain Gram-negative, are 
described to be motile (without a flagellum) [1], 
and ovoid or rod-shaped, 1.6-3.4 \im in length and 
0.4-0.7 \im in width (Figure 2 and Table 1). On 
Reasoner's 2A (R2A) agar they form pink to light 
red-pigmented colonies. According to [1], cells are 
negative for oxidase (but see below) and nitrate 
reduction activities, but show only weak catalase 
activity. They hydrolyze starch and Tween 80, as- 
similate cellulose, histidine, leucine and fructose, 
but do not utilize citrate and propionate. Cells test 



positive for leucine arylamidase, naphthol-AS-BI- 
phosphohydrolase and a-glucosidase. Growth is 
observed in a temperature range of 20-37°C with 
an optimum at 28°C. The pH range for growth is 
between pH 7-11 with an optimum at pH 7.0 ± 0.2. 
No growth occurs in the presence of NaCl in con- 
centrations of 0.5% and above. Cells of strain MSL- 
20 T do not utilize the carbohydrates cellobiose, D- 
mannose, salicin, D-xylose, a-melibiose, D- 
sorbitol, L-malate and D-ribose, which are utilized 
by its close relative R. thermophilum DSM 16684 T 
(all data from [1]). 




Chemotaxonomy 

The principal cellular fatty acids of strain MSL-20 T 
are Ci 6:0 (36.9%), Ci 8 :i<o7c (36.5%), 11-methyl Ci 8: i 

M 7c (12.4%), Cl8:0 (3.6%), Cl0:0 (1.3%), Cl2:0 (1.3%) 

and Ci7 : o (1.2%) and differ significantly from those 
detected in R. thermophilum. The major respirato- 
ry lipoquinone is ubiquinone Q-10, which is a 
common feature of alphaproteobacterial repre- 
sentatives (all data from [1]). 

Genome sequencing and annotation 

Genome project history 

The genome of strain R. mesophilum DSM 19309 T 
was first selected for genome sequencing in phase 
I of the one thousand microbial genomes (KMG-I) 
project [20], an extension of the Genomic Ency- 



clopaedia of Bacteria and Archaea (GEBA) [21], 
but ultimately sequenced within the DFG funded 
project "Ecology, Physiology and Molecular Biolo- 
gy of the Roseobacter clade: Towards a Systems 
Biology Understanding of a globally Important 
Clade of Marine Bacteria". The strain was chosen 
for genome sequencing according to a phylogeny- 
driven target selection procedure for large scale 
genome-sequencing (and other) projects as rou- 
tinely used for the KMG-I project [20,22]. 

The project information can be found in the Ge- 
nome OnLine Database [8]. The Whole Genome 
Shotgun (WGS) sequence is deposited in GenBank 
and the Integrated Microbial Genomes database 
(IMG) [23]. A summary of the project information 
is shown in Table 2. 
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Table 1. Classification and general features of R. mesophilum MSL-20 T according the MIGS recommendations [10] 
published by the Genome Standards Consortium [11]. 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS [12] 






Phylum Proteobacteria 


TAS [13] 






Class Alphaproteobacteria 


TAS [14,15] 






Order Rhodobacterales 


TAS [15,16] 






Family Rhodobacteraceae 


TAS [15,17] 






Genus Rubellimicrobium 


TAS [3] 




Current classification 


Species Rubellimicrobium mesophilum 
Strain MSL-20 T 


TAS [1] 




TAS [1] 




Gram stain 


negative 


TAS [1] 




Cell shape 


irregular rod-shaped 


TAS [1] 




Motility 


motile 


TAS [1] 




Sporulation 


non-sporulating 


NAS 




Temperature range 


20-37°C 


TAS [1] 




Optimum temperature 


28°C 


TAS [1] 




Salinity 


stenohaline 


TAS [1] 


MIGS-22 


Oxygen requirement 


aerobic 


TAS [1] 




Carbon source 


carbohydrates, amino acids 


TAS [1] 




Energy metabolism 


chemoorganotroph 


NAS 


MILo-b 


Habitat 


soil 


T A c n 1 
1 Ab [1] 


MIGS-15 


Biotic relationship 


free living 


TAS [1] 


MIGS-14 


Pathogenicity 


none 


NAS 




Biosafety level 


1 


TAS [18] 


MIGS- 


Isolation 






J*k 1 


soil 


TAS [1] 


MIGS-4 


Geographic location 


Bigeum island (Republic of Korea) 


TAS [1] 


MIGS-5 


Sample collection time 


April 2006 


NAS 


MIGS-4.1 


Latitude 


34.739 


NAS 


MIGS-4.2 


Longitude 


125.920 


NAS 


MIGS-4.3 


Depth 


not reported 




MIGS-4.4 


Altitude 


not reported 





Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable 
Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted 
property for the species, or anecdotal evidence). Evidence codes are from of the Gene Ontology project [19] 
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Table 2. Genome 


sequencing project information 




MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Non-contiguous finished 






Two genomic libraries: one lllumina PE library (420 bp 


MIGS-28 


i "1 1 

Libraries used 


' ■ " \ A T~ A 1 "» | — 1*1 /'ill* ■ " \ 

insert size), one 454 PE library (3 kb insert size) 






ill " s — ■ All ill " k A ' a r~ a /~> 

lllumina GA llx, lllumina MiSeq, 454 GS- 


MIGS-29 

I V 1 1 V_J Z 1 


Qpcn ipnrina nl^fforiTK 

JCUUCI Iti LJICILIL/I 1 1 i j 


Fl X+Tit^ n ii im 

I i_ / \ i i la i 1 1 Li 1 1 I 


MIGS-31 2 


QpniipnrinQ rnvpn^p 

JCUUCI ILII Ic LUVCI CltC 


129x 






Velvet version 1.1.36, Newbler version 2.3, Consed 


MIGS-30 


Assemblers 


20.0 


MIGS-32 


Gene calling method 


Prodigal 1 .4 




INSDC ID 


AOSK00000000 




Gen Bank Date of Release 


pending publication 




GOLD ID 


Gi0042374 




NCBI project ID 


188767 




Database: IMG 


2523533591 


MIGS-13 


Source material identifier 


DSM 19309 T 




Project relevance 


Tree of Life, biodiversity 



Growth conditions and DNA isolation 

A culture of DSM 193 09 T was grown aerobically in 
DSMZ medium 830 (R2A medium) [24] at 28°C. 
Genomic DNA was isolated using Jetflex Genomic 
DNA Purification Kit (GENOMED 600100) follow- 
ing the standard protocol provided by the manu- 
facturer, but modified by an incubation time of 60 
min, an overnight incubation on ice on a shaker, 
the use of additional 50 |il proteinase K, and the 
addition of 100 u.1 protein precipitation buffer. 
DNA is available from DSMZ through the DNA 
Bank Network [25]. 

Genome sequencing and assembly 

The genome was sequenced using a combination 
of two libraries (Table 2). The paired-end library 
contained inserts of an average of 420 bp in 
length. lllumina sequencing was performed on a 
GA llx platform with 150 cycles. The first run on 
the lllumina GA llx platform delivered 3.6 million 
reads. In order to increase the sequencing depth, a 
second lllumina run was performed, providing 
another 7.0 million reads. Error correction and 
clipping were performed by fastq-mcf [26] and 
quake [27]. The data was assembled using Velvet 
[28]. The first draft assembly from 5,400,234 fil- 
tered reads (median read length of 132 nt) result- 
ed in more than 143 unordered contigs. To gain 
information about the contig arrangement an ad- 
ditional 454 run was performed. The paired-end 



jumping library of 3 kb insert size was sequenced 
on 1/8 of a lane. Pyrosequencing resulted in 
102,695 reads with an average read length of 199 
bp, assembled with Newbler (Roche Diagnostics). 
The resulting assembly consisted of 261 scaffolds. 
Both draft assemblies (lllumina and 454 sequenc- 
es) were fractionated into artificial Sanger reads 
of 1,000 nt in length plus 75 bp overlap on each 
site. These artificial reads served as an input for 
the phred/phrap/consed package [29]. By manual 
editing, 138 contigs could be assembled on 127 
scaffolds. The combined sequences provided a 
129x coverage of the genome. 

Genome annotation 

Genes were identified using Prodigal [30] as part 
of the JGI genome annotation pipeline. The pre- 
dicted CDSs were translated and used to search 
the National Center for Biotechnology Information 
(NCBI) non-redundant database, UniProt, TIGR- 
Fam, Pfam, PRIAM, KEGG, COG, and InterPro data- 
bases. Identification of RNA genes was carried out 
by using HMMER 3.0rcl [31] (rRNAs) and 
tRNAscan-SE 1.23 [32] (tRNAs). Other non-coding 
genes were predicted using INFERNAL 1.0.2 [33]. 
Additional gene prediction analysis and functional 
annotation was performed within the Integrated 
Microbial Genomes - Expert Review (IMG-ER) 
platform [34]. CRISPR elements were detected us- 
ing CRT [35] and PILER-CR [36]. 
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Genome properties 

The genome statistics are provided in Table 3 and 
Figure 3. The genome of strain DSM 19309 T has a 
total length of 4,927,676 bp and a G+C content of 
69.7%. Of the 5,138 genes predicted, 5,082 were 
identified as protein-coding genes, and 56 as 



RNAs. The majority of the protein-coding genes 
(56.7%) were assigned a putative function while 
the remaining ones were annotated as hypothet- 
ical proteins. The distribution of genes into COGs 
functional categories is presented in Table 4. 



Table 3. Genome statistics 


Attribute 


Value 


% of Total 


Genome size (bp) 


4,927,676 


100.00 


DNA coding region (bp) 


a r~ a a r\ a 

4,254,404 


86.34 


DNA G+C content (bp) 


3,431,981 


69.65 


Number of scaffolds MIGS-9 


127 




Extrachromosomal elements MIGS-10 


1 




Total genes 


5,1 38 


100.00 


RNA genes 


56 


1.09 


rRNA operons 


1 




tRNA genes 


45 


0.88 


Protein-coding genes 


2,915 


56.73 


Genes with function prediction (proteins) 


2,167 


42.18 


Genes in paralog clusters 


4,172 


81.20 


Genes assigned to COGs 


3,818 


74.31 


Genes assigned Pfam domains 


3,977 


77.40 


Genes with signal peptides 


384 


7.47 


Genes with transmembrane helices 


966 


18.80 


CRISPR repeats 


0 






o 

Figure 3. Graphical map of the largest, 267,932 bp long scaffold. From bottom to the top: Genes on forward 
strand (colored by COG categories), Genes on reverse strand (colored by COG categories), RNA genes (tRNAs 
green), GC content (black), GC skew (purple/olive). 
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Table 4. Number of genes associated with the £ 


;eneral COG functional categories 


Code 


Value 


%age 


r)p^ri*inf inn 


I 

> 


186 


4.4 


Translation, ribosomal structure and biogenesis 


A 


3 


n 1 

VJ. 1 


RNJA nrnrp^inp .^nd modifir^tion 

l\ 1 > / \ LJ 1 1 1 1 ti ClI IU 1 1 1 U LI 1 1 1 d L 1 W 1 1 


1/ 


279 


o.o 


Tf^i nQr~fi ntion 

l I al IULIUI 1 


L 


250 


5 9 


Ron 1 \c a t ir\n rprnmninat'inn ann rpnaii" 

I\CIJ l IL.dLIUI \ f 1 CLUI 1 ID 1 1 IdLIUI 1 all LI iCUdll 


R 
D 


4 


0 1 


Chromatin <;trur , tiirp ^ind dvn^mirc; 

V_, 1 1 1 vjl 1 IU LI 1 1 J U U L IU 11. Ill IU U V 1 Kll 1 1 IL J 


D 


35 


0.8 


C pII c\/c\f^ f~ control r~oll ni\/icinn rhrnmncnmp n^ftitinnmo 

V_Cll Ly Lie L.UIILIUI, L-Cll UlvlDlUll, L,l 1 1 Ul 1 IU5UI 1 IC IJdl LI LIUI 1 1 1 1 tc. 


v 

1 


0 


0 0 


Nuclear structure 

J 1 UL> 1 V . U 1 l_l U *_ I LI 1 V . 


v 


34 


0.8 


Dpfpn^p mprh^ni^mt; 

L _>* V . J V.I 1 JV, 1 1 lUl 1 1 Jl 1 IJ 


j 


176 


4.2 


Signal transduction mechanisms 


/VI 


223 


j . j 


CpII w^i 1 l/mpmhr.3 np/pnvplnnp hinppnp^K 

V,.. V . II Vv Ul 1/ 1 1 IUI 1 IIJI Ul 1 \Zf CIIVCI L/UC \J 1 UtCI ICJl J 


N 


27* 


0.6 


r^pl 1 moti 1 i tv 

V_. v. 1 1 1 1 ILJLI 1 1 Ly 


7 


o 


n n 


("Vtoskplpton 


W 


0 


0.0 


Fyit;} r~pl 1 \ i \t\ r ctr\ \cf\ irpc. 

LA LI aLCIIUIal 3 LI ULLUI C3 


u 


51 


1 .2 


Intracellular trafficking and secretion, and vesicular transport 


o 


148 


3.5 


Posttranslational modification protein turnover,, chaperones 


c 


251 


6.0 


Fnerpv nroduction and conversion 


G 


453 


1 0.7 


Carbohvdrate transnort and metabolism 

V — . U 1 KJ \-f 1 1 V V^l 1 UlL LI ClI J J l-J \-f 1 L Ul 1 H III V . LU KJ \-f J 1 J> J L 1 


E 


478 


1 1 3 


Amino acid transnort and metabolism 

/ \llllll \-f UL 1 H l_l Ul 1 J l-J \-f 1 L Ul 1 H 1 1 1 l_U KJ \-f 1 1 J? 1 1 1 


F 


94 


2.2 


NJiirlpotidp tr^in^nnrt ^ind mpt^hnli^m 

1 > U ICUL 1 LJ\3 11 Ul 1 JUUI I Ul IU III v3 L CX \J W 1131 1 1 


H 

1 1 


152 


^ P. 


f"opn7vmp transnort and mptaholism 

V. V_/ 1 J Z_ y 11 l V . l_l U 1 I J yj \—f 1 1 U 1 J LI 1 1 I V . LCI \-J \—f 113 1 1 I 


1 
1 


143 


J .H 


1 mirl tr^ nennrt ^nrl mpts nn i c m 

LIUILJ LI dl L dllU 1 1 ICLdULJl 131 1 1 


p 


182 


4 3 


Innroanir ion fr^incnorf ^inrl mpt 3 nn i c m 

1 1 1 Ul fCdl IIL, LldllSUUlL dllU II iCLdJJUl 1 31 1 1 


w 


124 


1 Q 


Sprond^i rv mpt^holifpt: hiosvnthpsic; tr^insnnrt ^ind r^it^ihnlisrn 

JCL.UI lUul y II IClalJU 1 1 LCj UIUj y 1 ILI ICjIj, LI dl I31JUI L dl ILJ LalaUUI 1 D 1 1 I 


R 


508 


12.0 


vjeneiai luiicuuii uicuiclioii oniy 


S 


421 


10.0 


Function unknown 




1,320 


25.7 


Not in COGs 


*Only one ; 


gene each for flagellar motor 


and flag 


;ellar hook capping, no structural genes for flagella. 



Insights into the genome 
Plasmids 

The identification of plasmids is difficult because 
typical replication modules comprising the char- 
acteristic replicase and the adjacent parAB parti- 
tioning operon are missing [36]. However, com- 
prehensive BLASTP searches with plasmid 
replicases from Rhodobacterales revealed the 
presence of one RepB gene (rumeso_01479], 
whereas RepA-, RepABC-type and DnaA-like 



replicases are absent from the genome. The locali- 
zation of the chromosomal replication initiator 
DnaA documents that scaffold 15 is part of the 
chromosome (Table 5). 

The 119 kb RepB type plasmid contains a post- 
segregational killing system (PSK) consisting of a 
typical operon with two small genes encoding a 
stable toxin and an unstable antitoxin 
(rumesco_01477/78 [37];). 



Table 5. General genomic location and features of the chromosomal and one extrachromosomal replicon from R. 
mesophilum strain DSM 19309 T . 



Replicon 


Scaffold 


Replicase 


Length (bp) 


GC (%) 


Topology 


No. Genes* 


Chromosome 1 


15 


DnaA 


102,082 


71 


linear* 


105 


Plasmid 


9 


RepB 


119,205 


68 


linear* 


141 



^circularity not experimentally validated 
'deduced from automatic annotation 

1 partial sequence including the replicase dnaA (rumeso_021 52). 



908 



Standards in Genomic Sciences 



Riedel et al. 



Phages 

Phages are widely distributed and abundant in 
marine and freshwater environments [38-40] and 
are known to be horizontal gene transfer agents 
that drive bacterial diversity [40,41]. Temperate 
phage genomes can be integrated in the host ge- 
nome as prophages and perform a symbiotic rela- 
tionship with their hosts [42]. 

Several phage-associated gene sequences were 
detected in the genome sequence of strain DSM 
19309 T , particularly in "genomic islands" (e.g., 
rumeso_00405, rumeso_00407 rumeso_01586 to 
rumeso_01600). 

Quorum Sensing 

Several Gram-negative bacteria produce and re- 
lease chemical signal molecules called 
autoinducers. In correlation to the population 
density they detect those signal molecules and re- 
spond with an alteration of gene expression and 
therefore with diverse behaviors (e.g., lumines- 
cence, virulence, antibiotic resistance, changes in 
morphology and cell division] [43-46]. 

Genome analysis of strain DSM 19309 T revealed 
the presence of gene-encoding sequences associ- 
ated with the mechanism of quorum sensing e.g. 
N-homoserine-lactone synthetase, rumeso_02218 
{Luxl homologue); probably involved in response 
and transcriptional regulators, rumeso_02217 
{luxR homologue). 

Metabolic plasticity 

Unlike many representatives of the Roseobacter 
group [6], R. mesophilum DSM 19309 T encodes no 
genes involved in the harvesting of light and pho- 
toheterotrophic growth, which reflect its occur- 
rence in niches within soil that are characterized 
by the absence of light. Nevertheless, the annotat- 
ed genome sequence reveals a high metabolic ver- 
satility that was not expected by the phenotypic 
characterization presented in the species descrip- 
tion [1]. 

The genome encodes a large number of diverse 
ABC transporters facilitating the uptake of various 
substrates like carbohydrates (e.g., rumeso_04497 
to 04500), polyamines (e.g., rumeso_04716 to 
04719), peptides (e.g., rumeso_00087 to 00090), 
amino acids (e.g., rumeso_00231 to 00234) and 
sulfonates (e.g., rumeso_05058 to 05059). 
Sulfonates could represent unexpected but com- 
mon substrates for this species. The organic 
sulfonates taurine and cysteic acid are widely dis- 
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tributed in animal tissue and can enter soil by fe- 
ces. In some soil bacteria, these compounds are 
used as sole source of carbon, nitrogen and sulfur 
[47]. Indeed, a complete degradation pathway for 
taurine was detected in the genome of strain DSM 
19309 T . Taurine is first converted by a taurine- 
pyruvate aminotransferase (rumeso_05057) to 
sulfoacetaldehyde, which in turn is cleaved by the 
enzyme sulfoacetaldehyde acetyltransferase 
(rumeso_03970) into sulfite and acetyl-phosphate. 
Acetyl-phosphate can be either converted to ace- 
tyl-CoA by a phosphotransacetylase 
(rumeso_03968) and funneled into the intermedi- 
ary metabolism or is used for the generation of 
ATP by the enzyme acetate kinase 
(rumeso_03967). The potentially toxic compound 
sulfite can be oxidized to sulfate by various sulfite 
oxidases (e.g., rumeso_03951). 

In addition, the utilization of electron acceptors 
seems to be variable and not restricted to oxygen. 
Genes encoding at least two predicted cytochrome 
c oxidases, one of the ebbs-type (rumeso_00470 to 
00472) and the other of the aa3-type 
(rumeso_02204 to 02206), which terminate the 
electron transport chain with oxygen, were de- 
tected. However, according to the species descrip- 
tion strain MSL-20 T should be oxidase negative 
[1], we have found that the oxidase test for this 
strain is positive, which is in line with the results 
of the genome analysis. 

Under periodic anoxic conditions that frequently 
occur in wet soils, nitrate could be used as alterna- 
tive electron acceptor. According to the genome 
sequence, the denitrification pathway of this 
strain is probably incomplete and terminates with 
the greenhouse gas nitrous oxide (N2O), as has 
been previously demonstrated for Ottowia 
thiooxydans [48]. Only genes encoding a respirato- 
ry nitrate reductase (rumeso_02471 to 02474), ni- 
trite reductase (rumeso_02669) and nitric oxide 
reductase (rumeso_00142 to 00145) were detect- 
ed, whereas no genes for the terminal nitrous ox- 
ide reductase were found. 

Comparison of Rubellimicrobium genomes 

Recently the genome sequence of the type strain 
for second representative of the genus 
Rubellimicrobium, R. thermophilum DSM 16684 T 
became available [9]. Lifestyle, habitat and pre- 
ferred temperature range of R. thermophilum dif- 
fer significantly from the ones of R. mesophilum 
[3]. The genome sequences of both strains were 
compared using the digital DNA-DNA hybridiza- 
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tion (dDDH) tool GGDC server version 2.0, an 
online tool provided through the DSMZ web pages 
[49]. The resulting dDDH value of 19.3 ± 2.3% ac- 
cording to distance formula 2 (as described in 
[50]), confirmed that both strains belong to inde- 
pendent species. 

Figure 4 depicts the fraction of shared genes be- 
tween the two genome-sequenced 
Rubellimicrobium type strains and the type strain 
of Wenxinia marina [51], another closely related 
member of the Roseobacter group (see Figure 1). 



The number of pairwise genes was inferred from 
the phylogenetic profiler tool of the IMG platform. 
Homologous genes were detected with an E-value 
cutoff of 10 5 and a minimum identity of 30%. 
Proportions of 56% and 45% of the gene count in 
W. marina and R. mesophilum, respectively, are 
shared between all three genomes. In the case of 
R. thermophilum, a fraction of homologous genes 
of 70% is present in the other two genomes. Very 
few genes are shared only between R. 
thermophilum and W. marina. 



R. mesophilum R. thermophilum 

(5082) (3288) 

/ 466 \ * 
/ 1758/ _ \ 485 \ 

! A" ! 
\ / ^2318/ \ ; 

v '540 ^ 19 W 
1229 / 



W. marina 
(4106) 

Figure 4. Venn diagram depicting the intersections of proteins sets (total 
numbers in parentheses) of the two Rubellimicrobium species and W. marina. 



Although both genomes differ significantly in size 
(3.2 Mbp for R. mesophilum and 4.9 Mbp for R. 
thermophilum), the proportions of genes per COG 
category is very similar (Table 3 and [9]). The IMG 
Abundance Profile [34] demonstrated some dif- 
ferences, however. Enzymes for transport and uti- 
lization of amino acids and polyamines (COG1173, 
COG0747, C0G3842) were present in higher 
abundance in R. thermophilum, which is in agree- 
ment with the results from wet-lab substrate tests 
[1,3]. Huge differences in the abundance of pro- 
teins can be found within the class of transposases 
(COG2801, COG 3436, COG2936, COG0665, 
COG0404). While R. thermophilum codes for two 
transposases, more than 30 tranposase genes 
were identified in R. mesophilum. Combined with 
the presence of the site-specific recombinase XerD 
(involved in the recombination of plasmids [20]) 
this indicates a high level of genetic recombination 
within R. mesophilum. Furthermore, 23 genes cod- 
ing for RTX toxins and Ca + -binding proteins (COG 
2931) were found. These proteins are structurally 
diverse, playing an important role in the coloniza- 
tion of various habitats and surfaces [50]. Addi- 



tionally, 14 proteins of the xenobiotic-degrading 
glutathion-S-transferases were present in R. 
mesophilum. The occurrence of these proteins may 
enable the bacteria to grow in polluted areas. 

Taxonomic note 

The G +C content of the genomic DNA of strain 
MSL-20 T is given in the species description as 72.3 
mol% [1], which represents a discrepancy of more 
than 2% from the value of 69.7 mol% deduced 
from the genome sequence. In addition to the de- 
viant oxidase test this calls for an emendation of 
the species description according to the proposal 
of Meier-Kolthoff et al. [47]. 

Emended description of Rubellimicrobium 
mesophilum Dastager et al. 2008 

The description of the species Rubellimicrobium 
mesophilum is the one given by Dastager et al 
2008 [1], with the following modifications. 

Oxidase test is positive. The G+C content, rounded 
to zero decimal places, is 70%. 
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